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Abstract. In this paper, a face emotion is considered as the result of 
the composition of multiple concurrent signals, each corresponding to 
the movements of a specific facial muscle. These concurrent signals are 
represented by means of a set of multi-scale appearance features that 
might be correlated with one or more concurrent signals. The extrac¬ 
tion of these appearance features from a sequence of face images yields 
to a set of time series. This paper proposes to use the dynamics regu¬ 
lating each appearance feature time series to recognize among different 
face emotions. To this purpose, an ensemble of Hankel matrices corre¬ 
sponding to the extracted time series is used for emotion classification 
within a framework that combines nearest neighbor and a majority vote 
schema. Experimental results on a public available dataset shows that 
the adopted representation is promising and yields state-of-the-art accu¬ 
racy in emotion classification. 

Keywords: Emotion; Face Processing; LTI systems; Hankel Matrix; 
Classification 


1 Introduction 

Emotion recognition deals with the problem of inferring the emotion (i.e. fear, 
anger, surprise, etc.) given a sequence of face images. Due to strong inter-subject 
variations, especially in some kind of emotions (such as fear or sadness), and 
the difficulty to extract reliable feature representations because of illumination 
changes, biometric differences, and head pose changes, emotion recognition is a 
challenging problem. Nonetheless, recognition of face expressions and emotions 
is of great interest in many fields such as assistive technologies mi, m, socially 
assistive robotics [23], computational behavioral science m , mi, m, and the 
emerging field of audience measurement m- 

A vast literature on affective computing ESI, m, EH, has shown that an 
emotion can be identified by a subset of detected action units. This suggests 
that face emotion results as combination of movements of various facial muscles. 
Therefore in this paper we assume that a composition of multiple concurrent 
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signals yields to a face emotion. We use a restricted set of appearance features 
- computed on a frame-per-frame basis - that may be correlated with one or 
more of these concurrent signals. Given a sequence of face images corresponding 
to an emotion, the extraction of these appearance features yields to a set of 
time series, one for each appearance feature. Considering that face emotions are 
not instantaneous, we aim at using the dynamics regulating each sequence of 
appearance features to recognize among different emotions. 

We propose to model a sequence of face appearance feature as the output 
of a Linear Time Invariant (LTI) system. Motivated by the success of works in 
action recognition H2, HZ], that represent action-dynamics in terms of Han- 
kel matrices, in this paper we explore the use of Hankel matrices to represent 
emotion-dynamics. We adopt a multi-scale Haar-like feature based appearance 
representation to obtain a set of time series (one for each spatial scale and 
Haar-like feature). Hence we represent a sequence of face images by means of an 
ensemble of Hankel matrices where each Hankel matrix embeds the dynamics 
of one of the extracted Haar-like feature time series. Nearest-Neighbor classifier 
combined with a majority vote schema is used for classification purposes. 

We validated our work on the publicly available extended Cohn-Kanade 
dataset m ■ Our experiments show that there is a clear advantage in adopting a 
dynamics-based emotion representation over using the raw measurements. Fur¬ 
thermore, our experiments highlight that the dynamics of different appearance 
features contribute differently to the emotion recognition. Overall, our novel 
emotion representation permits to achieve state-of-the-art accuracy values in 
comparison to works that use accurate face landmarks. 

The plan of the work is as follows. In Section [2] we present works that are 
related to our emotion-dynamics representation. In Section [3] we describe how 
we extract a multi-scale face appearance description; Sectioned] introduces the 
Hankel matrix-based representation and describes how to build an ensemble of 
Hankel matrices to describe face appearance dynamics; Section [5]presents details 
about the adopted classification framework. Finally, in Sections [6] and [7] we 
present experimental results, and conclusions and future directions respectively. 


2 Related Work 

Face detection [311, face recognition [37], El and facial expression analysis |6; 
have been deeply studied in past years, resulting in a vast literature reviewed in 
[35] , [27]. In this section, we focus on works that embed the temporal structure 
of the face image sequence in the feature representation or in the emotion model. 

Dynamics-based emotion recognition has been proposed in [Sj where horizon¬ 
tal and vertical movements of tracked landmarks of different face parts such as 
eyebrows, eyelids, cheeks, and lip corners jointly with spatio-temporal appear¬ 
ance features are used to describe a sequence of face images. Temporal changes 
in the face appearance are described by means of the Complete Local Binary 
Patterns from Three Orthogonal Planes (LBP-TOP) [35 and classification is 
performed by SYM. While [5] attempts to embed information about the dy- 
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namics at a feature representation level, works such as [19], [28] account for 
the temporal structure of the sequences of descriptors in the emotion model. 
In [22], restricted Boltzmann machine with local interactions (LRBM) is used to 
capture the spatio-temporal patterns in the data. RBM is used as a generative 
model for data representation instead of feature learning, and data need to be 
pre-aligned. In m time-series kernel methods are used for emotional expression 
estimation using landmark data only. The work shows that emotion recognition 
may be done by adopting either the Dynamic Time Warping (DTW) kernel 
or the Global Alignment (GA) kernel [5], [3]. In [28] , a Bayesian approach is 
used to model dynamic facial expression temporal transitions. Facial appearance 
representation is computed in terms of Local Binary Patterns (LBP), and an ex¬ 
pression manifold is derived for multiple subjects. A Bayesian temporal model 
(similar to HMM with a non parametric observation model) of the manifold is 
used to represent facial expression dynamics. 

Works such as 0 , m use landmarks located on face parts such as eyes, 
eyebrows, nose and mouth to describe an emotion. In [5], a Constrained Local 
Model (CLM) is used to estimate facial landmarks and extract a sparse repre¬ 
sentation of corresponding image patches. Emotion classification is performed by 
least-square SVM. Wang et al. [22] propose to use Interval Temporal Bayesian 
Network (ITBN) to capture the spatial and temporal relations among the prim¬ 
itive facial events. 

Hankel matrices have been already adopted for action recognition in |12j . 
which adopts a Hankel matrix-based bag-of-words approach, and in m which 
models an action as a sequence of Hankel matrices and uses a set of HMM trained 
in a discriminative way to model the switching between LTI systems. In mi , we 
have showed how the dynamics of tracked facial landmarks can be modeled by 
means of Hankel matrices and can be used for facial expression analysis. 

Whilst it is possible to obtain a reasonably accurate estimate of the face re¬ 
gion EH, getting a reliable estimation of facial landmarks is still an open problem 
despite the remarkable progress described in 0 , m■ The adoption of appear¬ 
ance feature extracted from the detected face region to describe an emotion, 
as done indeed in [24], [28] • [35], [27], might be a convenient choice. Therefore, 
in this paper we adopt appearance features to represent a face expression. In 
contrast to [16] , we do not model landmark trajectories but we use an ensemble 
of Hankel matrices to describe the dynamics of sequences of appearance features 
computed at multiple spatial scales. We demonstrate that, without an accurate 
estimation of facial landmarks, our novel representation can achieve state-of-the- 
art accuracy in emotion recognition. 


3 Multi-Scale Face Appearance Representation 


Given a face image, we need to extract a proper appearance representation for 
the shown face expression. Considering the success of Haar-like features in face 
detection we adopt this kind of features to build our face appearance descriptor. 
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Fig. 1. The set of six Haar-like features used in this paper. 



Fig. 2. Haar-like features are extracted from the face region at different spatial scales: 
(a) the face region is detected and cropped; (b) centers of the sliding window used to 
compute the Haar-like features; (c) multiple scales used to calculate Haar-like features. 


Haar-like features resemble Haar wavelets and have been developed by Viola 
and Jones for face detection {31:. A Haar-like feature is computed by considering 
adjacent rectangular regions in a detection window. The pixel intensities in each 
region are summed up and the difference between these sums yields the Haar-like 
feature. In El], Haar-like features are compared against a threshold and used to 
detect the face; therefore they are used as weak classifiers and a high number of 
features are considered in order to build a strong classifier. The key advantage 
of a Haar-like feature over most other features is that it can be calculated in 
constant time due to the use of integral images. 

A number of Haar-like features have been used in literature m , 0 , and 
Haar-like features and/or simple variations have been formerly used in literature 
for emotion recognition [27]. [33]. [34] within boosting approaches. 

In this paper we only use the six most common features depicted in Figure 
[T] Intuitively, a multi-scale approach might account for different intensity of the 
emotion, which may change from subject-to-subject. Therefore we extract Haar- 
like features at different spatial scales. In this preliminary work, we do not model 
the weights of each extracted feature; modeling these weights/performing feature 
selection remains a topic of future investigations. The main steps we perform to 
extract our face appearance representation are: 

— we detect the face region (as shown in Fig. 1(a)); 

— within the face region, we consider a set of uniformly sampled points (red 
dots in Fig. 1 (b)); 

— we center windows of varying spatial scales at each of these sampled points 
(Fig. 1 (c) shows the windows centered at a representative point on the 
subject’s nose. Each color indicates a different scale.); 

— we extract our Haar-like features from each of the selected windows. When¬ 
ever the sliding window exceeds the size of the face region (especially along 
the boundary), the window is cropped so to consider only the pixels within 
the face area. In our implementation, the white and black rectangular re¬ 
gions of each Haar-like feature are computed in proportion to the window 
size, therefore the cropping does not affect the computation of the Haar-like 
features. 
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4 Ensemble of Hankel Matrices for Emotion-Dynamics 

In this section, first we briefly review LTI systems and Hankel matrix, then we 
describe our ensemble of Hankel matrices for emotion-dynamics representation. 


4.1 Hankel Matrix-based Dynamics Representation 

In a LTI system, two linear equations regulate the behavior of the system: 


x k +i = A ■ x k + w k ; 

y k =C-x k . ( 1 ) 


The first equation is known as the state equation and involves the variable 
x k £ R u , which represents the u-dimensional internal state of the LTI system. 
The second equation is known as the measurement equation and provides a link 
between the state of the system x k and the u-dimensional observable measure¬ 
ment y k . In such equations the matrices A and C are constant over time, and 
w k ~ N{0,Q) is uncorrelated zero mean Gaussian measurement noise. 

It is well known [30] that, given a sequence of output measurements [y a , ■ ■ ■, y T \ 
from Eq. |T] its associated truncated block-Hankel matrix is 

2/0? Vli * • * , Vm 

_ Vli y2i 2/3, • • ■ 5 Um-\- 1 (2) 

_ Vm J/n+li J/n+2j • • ■ , Vr 

where n is the maximal order of the system, r is the temporal length of the 
sequence, and it holds that r = n + m — 1. 

The Hankel matrix embeds the observability matrix r of the system, since 
H = F ■ X, where X = [xo,Xi, - ■ ■ ,x T \ is a matrix formed by the sequence of 
internal states of the LTI system. 

As previously done in P2, mi, we normalize the Hankel matrix H as follows: 


H = 


H 

J\\h-h*\\ f 


(3) 


and compare two Hankel matrices H p and H q by the following similarity score: 

s(H p ,H q ) = \\Hj-H q \\ F , (4) 

which can be easily derived from the dissimilarity score in |12j . We have exper¬ 
imentally found that our similarity score is numerically more stable and fast to 
compute than the dissimilarity score. Such score can be regarded as an approx¬ 
imation of the cosine of the subspace angle between the spaces spanned by the 
columns of the Hankel matrices. As such, it can convey the degree to which two 
Hankel matrices may correspond to the same dynamical system. 
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4.2 Emotion-Dynamics Representation 

The simple and fast appearance feature extraction described in Section [3] yields 
to a set of time series Y = where y = {y l { 3 , ■ ■ ■ y ’ ,J ’} is the time 

series corresponding to the i-th Haar-like feature at the j-th spatial scale (N is 
the number of Haar-like features, and S is the number of scales). Each element 
y l t ' 3 of this time series is a vector of features computed at the uniformly sampled 
points and representing the f-th face in the face image sequence. 

We use the set of time series Y to build an ensemble of Hankel matrices 
H = {iJ*’- 7 }*—where each Hankel matrix j s built upon the time series 
y 1,3 and, therefore, is associated with the i-th Haar-like feature and the j- th 
spatial scale. Before calculating the Hankel matrix, the sequence y 1 ' 3 is made 
zero mean. We note the following: 

— each vector y l t J is an ordered set of appearance features extracted from 
different parts of the face region. The set of Hankel matrices H 1 ’ 3 captures 
the dynamics of the Haar-like features over the whole face; 

— each Hankel matrix is built upon a single Haar-like feature; 

— each Hankel matrix is built upon a single scale; 

— modeling separately Haar-like features at different spatial scales has compu¬ 
tational advantages in terms of memory and time complexity; 

— Hankel matrices can be obtained by a simple and fast reordering of the ele¬ 
ments in the vector y’ t ' J . Therefore, from a computational point of view, the 
adoption of Hankel matrices over other time series representation is partic¬ 
ularly appealing. 

5 Emotion Classification 

To test the effectiveness of our novel representation we have adopted the simple 
and widely used nearest-neighbor classifier (NN). We compare Hankel matrices 
by using the similarity score in Eq. |4j Given an ensemble of Hankel matrices, 
each Hankel matrix contributes to the emotion classification by voting for a class 
(predicted by NN). Comparison of Hankel matrices is done on equal terms of 
Haar-like feature and scale (we compare only Hankel matrices that share the 
same scale and Haar-like feature). Decision on the predicted class is performed 
considering a majority vote schema. 

Other classification frameworks might be used, such as an LTI system code¬ 
book based representation similar to that proposed in [12] . or a state-based ap¬ 
proach similar to that in nz]. Alternatively, system identification techniques such 
as the ones applied in fSS], [20 can be adopted at the cost of an increased over¬ 
all time complexity. Even if stronger classification frameworks might be adopted 
as well, NN allows us to study the effectiveness of our representation without 
introducing further classifier-dependent parameters. 

6 Experimental Results 

We have performed experiments in emotion recognition on the widely adopted 
Extended Cohn-Kanade dataset (CK+) [2(Jj. This dataset provides facial ex- 
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Features 

Method 

An. 

Con. 

Disg. 

Fear 

Flap. 

Sad 

Surp. 

Avg 

g 


DTW + NN 

37.8 

55.6 

55.9 

16 

73.9 

21.4 

73.5 

47.7 



DTW + NN 

40 

38.9 

32.2 

20 

69.6 

10.7 

54.2 

37.9 

in 


DTW + NN 

40 

44.4 

22 

20 

63.8 

14.3 

50.6 

36.4 

p 


DTW + NN 

42.2 

66.7 

62.7 

12 

78.3 

10.1 

73.5 

49.4 

SB 


DTW + NN 

35.6 

38.9 

54.2 

12 

65.2 

10.7 

66.3 

40.4 

■ . 1*1 


DTW + NN 

57.8 

61.1 

59.3 

16 

68.1 

14.3 

72.3 

49.8 

m+ 

■ 

DTW + NN 

53.3 

55.6 

62.7 

16 

72.5 

10.7 

81.9 

50.4 

U + 

■ 

DTW + NN 

46.7 

72.2 

52.5 

20 

79.7 

7.1 

67.5 

49.4 

n+ 

■ 

DTW + NN 

48.6 

55.6 

50.8 

24 

78.3 

17.9 

65.1 

48.6 

N + 

■ 

DTW + NN 

60 

55.6 

59.3 

16 

76.8 

14.3 

80.7 

51.8 


■ 

DTW + NN 

53.3 

66.7 

57.6 

20 

78.3 

7.1 

79.5 

51.8 

«l + 

P 

DTW + NN 

44.4 

61.1 

50.8 

24 

84.1 

10.7 

73.5 

49.8 

all 


DTW + NN 

42.2 

72.2 

59.3 

20 

87 

14.3 

83.1 

54 

H 


Hankel + NN 

62.2 

72.2 

88.1 

40 

100 

42.9 

92.8 

71.2 



Hankel + NN 

71.1 

61.1 

81.4 

44 

94.2 

64.3 

87.9 

72 

n 


Hankel + NN 

57.8 

61.1 

81.4 

44 

97.1 

53.6 

84.3 

68.5 

p 


Hankel + NN 

44.4 

66.7 

84.7 

40 

98.5 

21.4 

94 

64.3 

a 


Hankel + NN 

77.8 

83.3 

83 

48 

97.1 

42.9 

90.4 

74.6 

a m 


Hankel + NN 

71.1 

77.8 

91.5 

48 

100 

60.7 

96.4 

77.9 

u+ 

■ 

Hankel + NN 

68.9 

77.8 

93.2 

44 

100 

57.1 

96.4 

76.8 

j + 

■ 

Hankel + NN 

82.2 

83.3 

91.5 

44 

100 

78.6 

91.6 

81.6 

n+ 

■ 

Hankel + NN 

75.6 

83.3 

89.8 

48 

100 

71.4 

92.8 

80.1 

N + 

■ 

Hankel + NN 

60 

72.2 

89.8 

40 

100 

53.6 

94 

72.8 


■ 

Hankel + NN 

84.4 

77.8 

89.8 

56 

100 

64.3 

95.2 

81.1 

m + 

P 

Hankel + NN 

62.2 

77.8 

89.8 

44 

100 

57.1 

91.6 

74.6 

all 


Hankel + NN 

86.7 

83.3 

96.6 

52 

100 

71.4 

97.6 

83.9 

CAPP 

SVM [20] 

70 

21.9 

94.7 

21.7 

100 

60 

98.7 

66.7 

LDN 

RBF-SVM [Hj** 

71.7 

73.7 

93.4 

90.5 

95.8 

78.9 

97.6 

85.9 

Shape (SPTS) SVM [20] 

35 

25 

68.4 

21.7 

98.4 

4 

100 

50.4 

Shape+CAPP SVM II 

70.1 

52.4 

92.5 

72.1 

94.2 

45.9 

93.6 

74.4 

Shape 

ITBN [32 

91.1 

78.6 

94 

83.3 

89.8 

76 

91.3 

86.3 

Shape 

LRBM [22] 

97.8 

72.2 

89.8 

84 

100 

78.6 

97.6 

88.6 

Shape + Hankel NN [TB] 

91.1 

83.3 

94.9 

84 

100 

71.4 

98.8 

89.1 


Table 1. Accuracy in Emotion Classification on the CK+ dataset. Red font indicates 
the best accuracy value per emotion, while bold font highlights the second best perfor¬ 
mance. **Different validation protocol (10-fold cross validation) 


pressions of 210 adults. Participants were instructed to perform several facial 
displays representing either single or combinations of action units. Based on the 
coded action units and by means of a validation procedure of the assigned la- 
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Tr. vs Pr. 

Angry 

Contempt 

Disgust 

Fear 

Happy 

Sadness 

Surprise 

Angry 

86.67 

0 

2.22 

2.22 

0 

6.67 

2.22 

Contempt 

0 

83.33 

0 

0 

5.56 

5.56 

5.56 

Disgust 

0 

0 

96.61 

0 

1.69 

0 

1.69 

Fear 

12 

4 

0 

52 

24 

4 

4 

Happy 

0 

0 

0 

0 

100 

0 

0 

Sadness 

7.14 

3.57 

0 

0 

3.57 

71.43 

14.29 

Surprise 

0 

1.20 

0 

0 

1.20 

0 

97.59 


Table 2. Confusion Matrix on the CK+ dataset when all the six Haar-like features 


are used. True labels are on rows, and predicted labels are on columns. 


bel, the segmented recording of the participants’ emotions were classified into 7 
categories (in brackets the number of available samples): angry (45), contempt 
(18), disgust (59), fear (25), happy (69), sadness (28), surprise (83). In total 
there are 327 sequences of the 7 annotated emotions, performed by 118 different 
individuals. The number of frames of these sequences ranges in [6, 71] with an 
average value of about 18 ±8.6. The dataset provides landmark tracking results 
obtained by active appearance model, which we use in our experiments only to 
detect the face region. We adopted the validation protocol suggested in [2Uj, 
which is leave-one-subject-out cross validation. 

When extracting the Haar-like features, we sample the center location uni¬ 
formly with a step equals to 10% of the size of the detected face region, yielding 
a 81 dimensional vector for each Haar-like feature. The spatial scales (size of the 
window used to calculate the Haar-like feature) are also computed in proportion 
to the face region size and the percentage ranges in {30,35,40,50,60}. The order 
of each Hankel matrix has been empirically set to 2. To extract the Haar-like 
features we have modified the implementation used in 0 , m- 

6.1 Results 

We have performed an extensive validation of the dynamics-based emotion repre¬ 
sentations whose results are reported in Table [T] The table reports the per-class 
classification accuracy values for each of the emotion classes, and the average 
accuracy. The table is divided in 4 parts. The first part presents accuracy values 
in classification when the raw features are adopted (namely Haar-like features). 
In this case, as the face image sequences have different lengths, dynamic time 
warping (DTW) is used to align the sequences and nearest-neighbor classifier 
is used over the aligned sequences. For a fair comparison, also when adopting 
the raw features, different Haar-like features and spatial scales are compared 
separately and a majority vote schema is used to predict the final class. 

The second part of the table presents results when an ensemble of Hankel 
matrices is used. Both the first and second part of the table report performance 
when a single Haar-like feature is used, when a pair of Haar-like features is used 
and, finally, when all the six Haar-like features are used. 

By comparing the first and second part of the table, there is a clear advantage 
in using an ensemble of Hankel matrices to represent the emotions over using 
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directly the Haar-like features. On average, the increase of performance in using 
the dynamics-based representation with respect to the raw measurements is of 
about 60.3%. 

Looking at the performance of each single Haar-like feature, the most in¬ 
formative one is the concentric squared regions (the last of the six features). 
Therefore we have performed experiments to study the performance of this fea¬ 
ture when coupled with another Haar-like feature. As the table shows, there is 
an improvement with three of the five Haar-like features. There is no improve¬ 
ment when the Haar-like feature is coupled with the first Haar-like feature and a 
degradation of the performance when coupled with the vertical bands Haar-like 
feature. What is striking is that in all the experiments, the emotion Happy is 
always correctly recognized 100% of times. This suggests that our ensemble of 
Hankel matrices can be appropriate for smile detection. A further improvement 
of the performance is obtained when all the Haar-like features are used together, 
at the cost of an higher computational complexity. We suspect that not all the 
features are actually contributing to the recognition of the emotion, and feature 
and scale selection techniques may help to achieve more accurate results. 

The third part of the table reports accuracy values of state-of-the-art meth¬ 
ods adopting only appearance features. The class for which our method seems 
to fail the most is the emotion Fear. If we ignore this class, our method achieves 
even better accuracy values of the most competitive method in [24]. For com¬ 
pleteness, the fourth part of the table reports the performance of techniques 
adopting accurate estimation of facial landmarks (provided together with the 
dataset). Even if these methods are not directly comparable with the ones that 
use only appearance information, we note that our appearance-based represen¬ 
tation competes already very well against these techniques. 

Finally, Table [2] reports the confusion matrix of our method. The class Fear 
is confused mostly with the class Happy. Some confusion is also present be¬ 
tween the classes Sadness and Surprise. We believe that these ambiguities might 
be probably solved with fine-grained appearance descriptors, such as the Local 
Directional Number (LDN) pattern introduced in [24] . 

7 Conclusions and Future Work 

In this paper we have proposed to use an ensemble of Hankel matrices to rep¬ 
resent the dynamics of face appearance features, where each Hankel matrix em¬ 
beds the dynamics of a single appearance feature at a given spatial scale. We 
have tested our novel emotion representation on a widely used publicly avail¬ 
able benchmark (CK+). Our experiments demonstrate that, on equal terms of 
classification framework and feature representations, the dynamics-based emo¬ 
tion representation achieves about 60.3% of increase in the accuracy values with 
respect of using directly the raw measurements. Overall, our approach achieves 
competitive performance with respect to more sophisticated machinery or meth¬ 
ods that use accurate shape information. 
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Our formulation is general and it is not limited to the adopted face appear¬ 
ance representation. We therefore aim at extending our work by considering 
other appearance features. Moreover, we believe that feature and scale selection 
techniques (i.e. boosting) might led to an increase of the accuracy of our ap¬ 
proach. In this paper, we have focused on the problem of classifying segmented 
emotion sequences. In future works we aim at tackling with the problem of emo¬ 
tion intensity estimation and emotion detection in face image sequences. In this 
sense, we will explore how face appearance feature dynamics correlate with the 
intensity of face emotions and if they can help in detecting subtle changes in 
face expressions. 

Acknowledgments This work was partially supported by Italian MIUR grant 
PON0101687, SINTESYS - Security and INTElligence SYStem. 
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